Why developers are spinning up AI behind your back — and how to detect it. The article discusses the rise of 'Shadow AI' - developers integrating LLMs into production without approval, the risks involved, and strategies for organizations to manage it effectively.
>We’ve seen LLMs used to auto-tag infrastructure, classify alerts, generate compliance doc stubs, and spin up internal search tools on top of knowledge bases. We’ve also seen them quietly embedded into CI/CD workflows...
Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.
A user is seeking advice on deploying a new server with 4x H100 GPUs (320GB VRAM) for on-premise AI workloads. They are considering a Kubernetes-based deployment with RKE2, Nvidia GPU Operator, and tools like vLLM, llama.cpp, and Litellm. They are also exploring the option of GPU pass-through with a hypervisor. The post details their current infrastructure and asks for potential gotchas or best practices.
Arize Phoenix is an open-source observability library for AI experimentation, evaluation, and troubleshooting, built by Arize AI.
K8sGPT is a tool for scanning Kubernetes clusters, diagnosing issues in simple English, and enriching data with AI. It helps with workload health analysis, security CVE review, and more.